Calgary has seen a surge in real estate and construction in recent years, with residential building permit value jumping 55% in 2021, rebounding from the low during the COVID-19 outbreak in 2020 despite a higher than national average unemployment rate. The residential permit value has been increasing rapidly since 2021, while the average hourly wage rate remains stagnant, making housing unaffordable for many Calgary residents.
This project will explore how the residential building permit value varies in different communities within Calgary. I am focusing on the New-built Single-family and Commercial residential building permit and residential improvement building permit issued between 2018- OCT 2, 2023. I would like to see if the value of these permits will reflect the housing trend (large apartment complex versus single-family home versus home improvement) in each community and how the trends change before, during, and after the COVID-19 outbreak from 2018-2023. Since there are over 300 small communities around Calgary, we will divide those communities into eight zones (center, north, south, east, west, northwest, northeast, and southeast) as the image illustrates.
import pandas as pd
import numpy as np
import matplotlib as mpl
import plotly as plotly
plotly.__version__
'5.9.0'
There are two datasets that I used from the Open Calgary website. My main dataset is Building Permits, and my secondary dataset is Community Crime and Disorder Statistics(2012-2019), which I use to create the community-to-sector map dictionary. I filter the Building Permits dataset into three datasets based on three types of residential building permits: New-built Commercial/Multi-Family, New-built Single-Family, and Residential improvement.
Data wrangling tasks are dropping unused columns, changing some column types, creating community-to-sector maps, adding sector columns to the three residential permit datasets, and removing rows with missing information.
Data visualization tasks are grouping the data by year and sector, calculating the yearly permit value, preplot the data for verification, and creating tables for plotting stack bar charts in Plotly.
I use the built-in query service on Open Data Calgary to get three datasets for each residential permit type: New-built Commercial/Multi-Family, New-built Single-Family, and Residential improvement. I also downloaded Community Crime and Disorder Statistics(2012-2019) without query filtering.
Query description:
IssuedDate,PermitClassMapped, PermitType, WorkClass, HousingUnitsBuilding_Permits6625L5Multi.csv, Building_Permits131191L5Single.csv, Building_Permits179416L3Improve.csv# Load Commerical/Multi-Family permit, select columns, convert float Ward Boundaries column to string Ward column.
source_df = pd.read_csv('Building_Permits6625L5Multi.csv')
Multi_df = source_df[['IssuedDate', 'EstProjectCost', 'HousingUnits','CommunityName','Ward Boundaries']]
Ward = Multi_df['Ward Boundaries'].astype(str)
pd.DataFrame(Ward)
Ward = Ward.rename("Ward")
Multi_df = pd.concat([Multi_df,Ward],axis =1)
Multi_df = Multi_df.drop("Ward Boundaries",axis='columns')
display(Multi_df.head())
# repeat above process with Single Familty permit. I don't need HousingUnits because this count as 1 unit.
sources_df = pd.read_csv('Building_Permits131191L5Single.csv')
Single_df = sources_df[['IssuedDate', 'EstProjectCost','CommunityName','Ward Boundaries']]
Wards = Single_df['Ward Boundaries'].astype(str)
pd.DataFrame(Wards)
Wards = Wards.rename("Ward")
Single_df = pd.concat([Single_df,Wards],axis =1)
Single_df = Single_df.drop("Ward Boundaries",axis='columns')
display(Single_df.head())
# repeat above process with Residential Improvement permit. I don't need HousingUnits because this count as 0 unit.
sourcei_df = pd.read_csv('Building_Permits179416L3Improve.csv')
Improv_df = sourcei_df[['IssuedDate', 'EstProjectCost','CommunityName','Ward Boundaries']]
Wardi = Improv_df['Ward Boundaries'].astype(str)
pd.DataFrame(Wardi)
Wardi = Wardi.rename("Ward")
Improv_df = pd.concat([Improv_df,Wardi],axis =1)
Improv_df = Improv_df.drop("Ward Boundaries",axis='columns')
display(Improv_df.head())
| IssuedDate | EstProjectCost | HousingUnits | CommunityName | Ward | |
|---|---|---|---|---|---|
| 0 | 10/2/23 | 580,925 | 2 | SAGE HILL | nan |
| 1 | 10/2/23 | 1,409,387 | 4 | SAGE HILL | nan |
| 2 | 10/2/23 | 1,366,160 | 2 | PARKHILL | nan |
| 3 | 10/2/23 | 1,926,194 | 8 | MAHOGANY | nan |
| 4 | 10/2/23 | 969,508 | 4 | MAHOGANY | nan |
| IssuedDate | EstProjectCost | CommunityName | Ward | |
|---|---|---|---|---|
| 0 | 2022/11/28 | 373,884 | CAMBRIAN HEIGHTS | nan |
| 1 | 2021/11/10 | NaN | WEST HILLHURST | 7.0 |
| 2 | 2022/11/09 | 375,787 | BELMONT | nan |
| 3 | 2022/05/13 | 315,574 | AMBLETON | 3.0 |
| 4 | 2022/11/30 | 259,719 | MAHOGANY | nan |
| IssuedDate | EstProjectCost | CommunityName | Ward | |
|---|---|---|---|---|
| 0 | 10/2/23 | 3,942 | CARRINGTON | nan |
| 1 | 10/2/23 | 5,000 | MAYLAND HEIGHTS | nan |
| 2 | 10/2/23 | 55,549 | CRANSTON | nan |
| 3 | 10/2/23 | 3,000 | GLENDALE | nan |
| 4 | 10/2/23 | 46,677 | AMBLETON | nan |
Now, I check if the EstProjectCost column of each dataset has NAN or 0. If so, I will remove them. Only single-family and residential improvement permit dataset has NAN in their EstProjectCost column. Then, I checked the CommunityName column of each dataset and found no NAN, so I am good to go.
Multi_df['EstProjectCost'].isnull().any()
Single_df['EstProjectCost'].isnull().any()
Improv_df['EstProjectCost'].isnull().any()
Improv_df = Improv_df.dropna(subset=['EstProjectCost'])
Single_df = Single_df.dropna(subset=['EstProjectCost'])
Multi_df['CommunityName'].isnull().any()
Single_df['CommunityName'].isnull().any()
Improv_df['CommunityName'].isnull().any()
Single_df['EstProjectCost'].isnull().any()
Improv_df['EstProjectCost'].isnull().any()
False
Now I have to map community name to eight sectors, which is a bit tricky than mapping ward boundaries to eight sectors, because there are only 14 wards but over 300 communities. But there are a lot of missing data in the ward coulmn and none in community name I have no choice. Luckily, I can extract the communities and sectors map from Community Crime and Disorder Statistics 2012-2019 dataset which contain most of the communites in my Building Permit dataset. I filter unique community name along with Sector from the dataset, check for missing values, and change the Community Name columns to match the column name in my main dataset so I can merge them.
#Update Community to Sectors Map
Crime_df = pd.read_csv('Community_Crime_and_Disorder_Statistics__2012-2019_.csv')
CrimeUnique_df = Crime_df.drop_duplicates(subset = "Community Name")
Commumap_df = CrimeUnique_df[['Community Name','Sector']]
Commumap_df = Commumap_df.rename(columns={"Community Name": "CommunityName"})
Commumap_df.isnull().any()
display(Commumap_df.head())
# Add sector column by merging with community map using community name as merge key
MultiWCommu_df = Multi_df.merge(Commumap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
display(MultiWCommu_df)
| CommunityName | Sector | |
|---|---|---|
| 0 | THORNCLIFFE | NORTH |
| 1 | WOODBINE | SOUTH |
| 2 | WILLOW PARK | SOUTH |
| 4 | LINCOLN PARK | WEST |
| 5 | RAMSAY | CENTRE |
| IssuedDate | EstProjectCost | HousingUnits | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|---|
| 3809 | 7/10/13 | 849,474 | 4 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| 3451 | 4/11/14 | 16,175,000 | 100 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| 2722 | 9/7/16 | 2,223,322 | 23 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| 1556 | 2/21/20 | 875,000 | 4 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| 1658 | 11/5/19 | 900,000 | 4 | ALBERT PARK/RADISSON HEIGHTS | 12.0 | EAST |
| ... | ... | ... | ... | ... | ... | ... |
| 127 | 6/26/23 | 834,684 | 4 | YORKVILLE | nan | NaN |
| 1675 | 10/25/19 | 1,135,741 | 4 | YORKVILLE | 6.0 | NaN |
| 129 | 6/26/23 | 827,467 | 4 | YORKVILLE | nan | NaN |
| 1814 | 6/19/19 | 1,114,523 | 4 | YORKVILLE | 6.0 | NaN |
| 1848 | 5/16/19 | 1,135,741 | 4 | YORKVILLE | 6.0 | NaN |
6626 rows × 6 columns
Seem like the Community to sector map still missing some community Name, I have to take a look and manaually update it. Looks like I am missing 10 community. So I search for missing community update it. So now I have created a new cvs file, Commap.cvs and use it to add sector column to all three dataset.
# Update Community map manually and reload it
Commu2SectorMap_df = pd.read_csv('Commumap.csv')
# Merge again and test if there still NaN in sector column
MultiWCommu_df = Multi_df.merge(Commu2SectorMap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
narow = MultiWCommu_df.loc[MultiWCommu_df.Sector.isnull()]
display(narow)
SingleWCommu_df = Single_df.merge(Commu2SectorMap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
narows = SingleWCommu_df.loc[SingleWCommu_df.Sector.isnull()]
display(narows)
ImprovWCommu_df = Improv_df.merge(Commu2SectorMap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
narowi = ImprovWCommu_df.loc[ImprovWCommu_df.Sector.isnull()]
display(narowi)
| IssuedDate | EstProjectCost | HousingUnits | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|---|
| 1359 | 11/6/20 | 770,785 | 4 | Undefined | 3.0 | NaN |
| 1481 | 6/23/20 | 1,992,860 | 10 | Undefined | 3.0 | NaN |
| 953 | 12/3/21 | 1,064,277 | 4 | Undefined | 4.0 | NaN |
| 1345 | 12/2/20 | 9,602,466 | 64 | Undefined | 3.0 | NaN |
| 1984 | 12/18/18 | 477,806 | 6 | Undefined | 6.0 | NaN |
| 1264 | 3/19/21 | 1,542,523 | 10 | Undefined | 6.0 | NaN |
| 2491 | 7/4/17 | 20,261,732 | 162 | Undefined | 4.0 | NaN |
| 1339 | 12/16/20 | 1,101,602 | 6 | Undefined | 3.0 | NaN |
| 939 | 12/13/21 | 1,546,871 | 6 | Undefined | 4.0 | NaN |
| 1268 | 3/16/21 | 907,963 | 5 | Undefined | 11.0 | NaN |
| 1272 | 3/9/21 | 809,223 | 5 | Undefined | 11.0 | NaN |
| 1986 | 12/18/18 | 567,662 | 8 | Undefined | 6.0 | NaN |
| 1301 | 2/9/21 | 916,680 | 5 | Undefined | 11.0 | NaN |
| 1140 | 6/15/21 | 13,133,900 | 60 | Undefined | 5.0 | NaN |
| 1987 | 12/18/18 | 325,791 | 4 | Undefined | 6.0 | NaN |
| 928 | 12/24/21 | 1,044,115 | 4 | Undefined | 4.0 | NaN |
| 1170 | 5/31/21 | 10,692,300 | 48 | Undefined | 5.0 | NaN |
| 2301 | 3/19/18 | 13,107,065 | 83 | Undefined | 4.0 | NaN |
| 1160 | 6/4/21 | 762,473 | 4 | Undefined | 6.0 | NaN |
| 1371 | 10/27/20 | 1,124,995 | 6 | Undefined | 3.0 | NaN |
| 843 | 3/17/22 | 1,515,280 | 9 | Undefined | 4.0 | NaN |
| 757 | 5/4/22 | 2,833,228 | 12 | Undefined | 4.0 | NaN |
| 1370 | 10/30/20 | 770,785 | 4 | Undefined | 3.0 | NaN |
| 1206 | 5/10/21 | 1,479,205 | 8 | Undefined | 3.0 | NaN |
| 937 | 12/14/21 | 1,546,871 | 6 | Undefined | 4.0 | NaN |
| 1207 | 5/7/21 | 1,479,205 | 8 | Undefined | 3.0 | NaN |
| 838 | 3/25/22 | 1,999,630 | 15 | Undefined | 4.0 | NaN |
| 1443 | 8/12/20 | 1,134,102 | 7 | Undefined | 6.0 | NaN |
| 950 | 12/7/21 | 1,827,288 | 7 | Undefined | 4.0 | NaN |
| 1447 | 8/10/20 | 1,025,547 | 6 | Undefined | 6.0 | NaN |
| 904 | 1/26/22 | 11,742,885 | 47 | Undefined | 4.0 | NaN |
| 905 | 1/26/22 | 6,118,078 | 47 | Undefined | 4.0 | NaN |
| 1365 | 11/4/20 | 947,765 | 5 | Undefined | 3.0 | NaN |
| 800 | 4/22/22 | 3,244,257 | 12 | Undefined | 4.0 | NaN |
| 2007 | 12/5/18 | 1,500,000 | 4 | Undefined | 7.0 | NaN |
| IssuedDate | EstProjectCost | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|
| 4854 | 2023/07/18 | 326,677 | Undefined | nan | NaN |
| 3598 | 2023/06/20 | 427,996 | Undefined | nan | NaN |
| 2421 | 2023/05/31 | 262,511 | Undefined | nan | NaN |
| 529 | 2023/07/26 | 339,427 | Undefined | nan | NaN |
| 3021 | 2023/06/09 | 262,511 | Undefined | nan | NaN |
| 1953 | 2023/06/12 | 274,500 | Undefined | nan | NaN |
| 2069 | 2023/06/01 | 275,962 | Undefined | nan | NaN |
| 2775 | 2023/07/27 | 350,085 | Undefined | nan | NaN |
| 1654 | 2023/03/23 | 202,247 | Undefined | nan | NaN |
| 4950 | 2023/07/18 | 312,069 | Undefined | nan | NaN |
| 2458 | 2023/06/21 | 274,500 | Undefined | nan | NaN |
| 1223 | 2023/03/03 | 166,041 | Undefined | nan | NaN |
| 348 | 2023/07/26 | 339,427 | Undefined | nan | NaN |
| 297 | 2023/07/20 | 326,506 | Undefined | nan | NaN |
| 1850 | 2023/07/18 | 329,513 | Undefined | nan | NaN |
| 2655 | 2023/03/23 | 355,375 | Undefined | nan | NaN |
| 5141 | 2023/07/18 | 312,473 | Undefined | nan | NaN |
| 4934 | 2023/06/10 | 422,102 | Undefined | nan | NaN |
| 1807 | 2023/03/23 | 165,854 | Undefined | nan | NaN |
| 3054 | 2023/03/17 | 165,854 | Undefined | nan | NaN |
| 1864 | 2023/07/18 | 306,008 | Undefined | nan | NaN |
| 300 | 2023/05/31 | 432,076 | Undefined | nan | NaN |
| 3328 | 2023/06/02 | 272,168 | Undefined | nan | NaN |
| 4917 | 2023/06/20 | 272,168 | Undefined | nan | NaN |
| 2419 | 2023/05/31 | 262,511 | Undefined | nan | NaN |
| 2413 | 2023/09/28 | 327,038 | Undefined | nan | NaN |
| 5321 | 2023/06/29 | 271,790 | Undefined | nan | NaN |
| 3633 | 2023/07/17 | 240,560 | Undefined | nan | NaN |
| 3130 | 2023/03/17 | 202,419 | Undefined | nan | NaN |
| 2968 | 2023/07/11 | 356,430 | Undefined | nan | NaN |
| 2063 | 2023/06/15 | 275,530 | Undefined | nan | NaN |
| 5262 | 2023/06/29 | 311,721 | Undefined | nan | NaN |
| IssuedDate | EstProjectCost | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|
| 3078 | 6/12/23 | 40,505 | Undefined | nan | NaN |
| 2730 | 6/21/23 | 44,555 | Undefined | nan | NaN |
The Undefined and NaN community name causes the NaN in the column sector. Only the Muti-Commercial dataset has ward information for those Undefined communities. Since the Single-Family dataset has 32 entries with no community name and ward information, and the Residential improvement dataset has 14, I decided to remove those entries. However, since the Muti-Commercial dataset has ward information, I decided to replace Undefined CommunityName with their ward number and add the ward number into my Commu2SectorMap data frame so I can merge them again to get rid of NaN in the sector column.
I also convert the AppliedDate column to Pandas date-time type and the EstProjectCost column to integer.
#replacing the Undefined community name with Ward.
Multi_df.CommunityName =np.where(Multi_df.CommunityName == 'Undefined',
Multi_df['Ward'],Multi_df.CommunityName)
# Create Ward dictinary and add it into the Community to Sector Map
WardDict = {'CommunityName':['1.0','2.0','3.0','4.0','5.0','6.0','7.0','8.0','9.0','10.0','11.0','12.0','13.0','14.0'],
'Sector':['NORTHWEST','NORTH','NORTH','NORTH','NORTHEAST','WEST','CENTRE','CENTRE','EAST','NORTHEAST',
'SOUTH','SOUTHEAST','SOUTH','SOUTH']}
WardDict_df = pd.DataFrame(WardDict)
Commu2SectorMap_df = pd.concat([Commu2SectorMap_df, WardDict_df], ignore_index = True)
# Merge again to get rid of NaN in Sector column of MutiWCommu. Don't need to do this with other 2 since I have to drop it
MultiWCommu_df = Multi_df.merge(Commu2SectorMap_df, on='CommunityName', how='left').sort_values(by='CommunityName')
# Drop the row with Undefined community and missing ward information
SingleWCommu_df = SingleWCommu_df.dropna(subset=['Sector'])
# Drop the row with Undefined community and missing ward information
ImprovWCommu_df = ImprovWCommu_df.dropna(subset=['Sector'])
# Change issue date to date-time type
MultiWCommu_df['IssuedDate'] = pd.to_datetime(MultiWCommu_df['IssuedDate'])
SingleWCommu_df['IssuedDate'] = pd.to_datetime(SingleWCommu_df['IssuedDate'])
ImprovWCommu_df['IssuedDate'] = pd.to_datetime(ImprovWCommu_df['IssuedDate'])
# Change the EstProjectCost column from string to int. I need to remove the comma first so I can convert to int
MultiWCommu_df['EstProjectCost'] = MultiWCommu_df['EstProjectCost'].str.split(',').str.join('').astype(int)
SingleWCommu_df['EstProjectCost'] = SingleWCommu_df['EstProjectCost'].str.split(',').str.join('').astype(int)
ImprovWCommu_df['EstProjectCost'] = ImprovWCommu_df['EstProjectCost'].str.split(',').str.join('').astype(int)
display(MultiWCommu_df)
display(SingleWCommu_df)
display(ImprovWCommu_df)
| IssuedDate | EstProjectCost | HousingUnits | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|---|
| 1268 | 2021-03-16 | 907963 | 5 | 11.0 | 11.0 | SOUTH |
| 1272 | 2021-03-09 | 809223 | 5 | 11.0 | 11.0 | SOUTH |
| 1301 | 2021-02-09 | 916680 | 5 | 11.0 | 11.0 | SOUTH |
| 1206 | 2021-05-10 | 1479205 | 8 | 3.0 | 3.0 | NORTH |
| 1371 | 2020-10-27 | 1124995 | 6 | 3.0 | 3.0 | NORTH |
| ... | ... | ... | ... | ... | ... | ... |
| 1863 | 2019-05-02 | 1158342 | 4 | YORKVILLE | 6.0 | SOUTH |
| 1878 | 2019-04-11 | 681195 | 4 | YORKVILLE | 6.0 | SOUTH |
| 1879 | 2019-04-11 | 1125554 | 4 | YORKVILLE | 6.0 | SOUTH |
| 1848 | 2019-05-16 | 1135741 | 4 | YORKVILLE | 6.0 | SOUTH |
| 1332 | 2020-12-30 | 1461013 | 8 | YORKVILLE | 6.0 | SOUTH |
6626 rows × 6 columns
| IssuedDate | EstProjectCost | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|
| 80714 | 2011-02-25 | 151527 | ABBEYDALE | 12.0 | NORTHEAST |
| 80037 | 2011-07-14 | 441365 | ACADIA | 10.0 | SOUTH |
| 91126 | 2013-05-30 | 362893 | ACADIA | 10.0 | SOUTH |
| 106298 | 2016-07-26 | 323962 | ACADIA | 10.0 | SOUTH |
| 103754 | 2015-10-23 | 301952 | ACADIA | 10.0 | SOUTH |
| ... | ... | ... | ... | ... | ... |
| 1463 | 2023-09-21 | 353680 | YORKVILLE | nan | SOUTH |
| 121788 | 2020-11-18 | 193632 | YORKVILLE | 6.0 | SOUTH |
| 5336 | 2023-08-02 | 432043 | YORKVILLE | nan | SOUTH |
| 123918 | 2021-02-02 | 194845 | YORKVILLE | 6.0 | SOUTH |
| 125150 | 2021-06-28 | 432043 | YORKVILLE | 6.0 | SOUTH |
130736 rows × 5 columns
| IssuedDate | EstProjectCost | CommunityName | Ward | Sector | |
|---|---|---|---|---|---|
| 91844 | 2012-10-18 | 15000 | ABBEYDALE | 12.0 | NORTHEAST |
| 2055 | 2023-07-11 | 2000 | ABBEYDALE | nan | NORTHEAST |
| 172338 | 2000-08-30 | 9570 | ABBEYDALE | 12.0 | NORTHEAST |
| 148432 | 2005-05-17 | 9870 | ABBEYDALE | 12.0 | NORTHEAST |
| 72803 | 2015-02-26 | 122500 | ABBEYDALE | 12.0 | NORTHEAST |
| ... | ... | ... | ... | ... | ... |
| 5811 | 2023-02-21 | 28298 | YORKVILLE | nan | SOUTH |
| 14502 | 2022-02-24 | 34798 | YORKVILLE | 6.0 | SOUTH |
| 34746 | 2020-01-28 | 23900 | YORKVILLE | 6.0 | SOUTH |
| 26010 | 2020-11-18 | 12858 | YORKVILLE | 6.0 | SOUTH |
| 24173 | 2021-02-12 | 13755 | YORKVILLE | 6.0 | SOUTH |
175126 rows × 5 columns
Finally, my data-wrangling tasks are a success! Now, I will start to prepare my data for visualization.
# Group by Year so I can get yearly EstProjectCost for each sector
MultiYear = MultiWCommu_df.groupby([MultiWCommu_df.IssuedDate.dt.year,'Sector'],as_index=True)['EstProjectCost'].sum()
display(MultiYear)
SingleYear = SingleWCommu_df.groupby([SingleWCommu_df.IssuedDate.dt.year,'Sector'],as_index=True)['EstProjectCost'].sum()
display(SingleYear)
ImprovYear = ImprovWCommu_df.groupby([ImprovWCommu_df.IssuedDate.dt.year,'Sector'],as_index=True)['EstProjectCost'].sum()
display(ImprovYear)
IssuedDate Sector
2000 CENTRE 114337522
EAST 16098198
NORTH 7616974
NORTHEAST 7357570
NORTHWEST 25252600
...
2023 NORTHEAST 111710577
NORTHWEST 117496901
SOUTH 165090967
SOUTHEAST 137599026
WEST 318603071
Name: EstProjectCost, Length: 189, dtype: int64
IssuedDate Sector
2000 CENTRE 23228708
EAST 1557527
NORTH 91507117
NORTHEAST 84846542
NORTHWEST 194974840
...
2023 NORTHEAST 182845981
NORTHWEST 94970150
SOUTH 233156378
SOUTHEAST 184531020
WEST 56401500
Name: EstProjectCost, Length: 192, dtype: int64
IssuedDate Sector
2000 CENTRE 12796193
EAST 1970782
NORTH 5091377
NORTHEAST 4867353
NORTHWEST 7278539
...
2023 NORTHEAST 54223545
NORTHWEST 35124127
SOUTH 57943309
SOUTHEAST 36279922
WEST 26458502
Name: EstProjectCost, Length: 192, dtype: int64
I will first plot them seperately to see the overall scale.
import matplotlib.pyplot as plt
plt.suptitle('Muti-Commercial Residential Permit value by sector')
plt.subplot(2,3,1)
MultiYear[2023].plot.bar()
plt.title('2023')
plt.xticks(visible=False)
plt.xlabel('')
plt.ylabel('Permit value $')
plt.subplot(2,3,2)
MultiYear[2022].plot.bar()
plt.title('2022')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,3)
MultiYear[2021].plot.bar()
plt.title('2021')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,4)
MultiYear[2020].plot.bar()
plt.title('2020')
plt.ylabel('Permit value $')
plt.subplot(2,3,5)
MultiYear[2019].plot.bar()
plt.title('2019')
plt.subplot(2,3,6)
MultiYear[2018].plot.bar()
plt.title('2018')
plt.tight_layout()
plt.suptitle('Single-Family Residential Permit value by sector')
plt.subplot(2,3,1)
SingleYear[2023].plot.bar()
plt.title('2023')
plt.xticks(visible=False)
plt.xlabel('')
plt.ylabel('Permit value $')
plt.subplot(2,3,2)
SingleYear[2022].plot.bar()
plt.title('2022')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,3)
SingleYear[2021].plot.bar()
plt.title('2021')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,4)
SingleYear[2020].plot.bar()
plt.title('2020')
plt.ylabel('Permit value $')
plt.subplot(2,3,5)
SingleYear[2019].plot.bar()
plt.title('2019')
plt.subplot(2,3,6)
SingleYear[2018].plot.bar()
plt.title('2018')
plt.tight_layout()
plt.suptitle('Residential Improvement Permit value by sector')
plt.subplot(2,3,1)
ImprovYear[2023].plot.bar()
plt.title('2023')
plt.xticks(visible=False)
plt.xlabel('')
plt.ylabel('Permit value $')
plt.subplot(2,3,2)
ImprovYear[2022].plot.bar()
plt.title('2022')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,3)
ImprovYear[2021].plot.bar()
plt.title('2021')
plt.xticks(visible=False)
plt.xlabel('')
plt.subplot(2,3,4)
ImprovYear[2020].plot.bar()
plt.title('2020')
plt.ylabel('Permit value $')
plt.subplot(2,3,5)
ImprovYear[2019].plot.bar()
plt.title('2019')
plt.subplot(2,3,6)
ImprovYear[2018].plot.bar()
plt.title('2018')
plt.tight_layout()
Now, I will create tables for each year with three columns(Sector, Residential Permit Type, Total Permit Value) and 24 rows and plot the permit value of three permit types together in one plot for each year.
I will also create line plots to show each type of permit value over time for the 8 eight sectors. To do this, I will create tables for each permit type with three columns( Year, Sector, Total Permit Value) and 48 rows.
S2018 = SingleYear[2018][0:8]
S2019 = SingleYear[2019][0:8]
S2020 = SingleYear[2020][0:8]
S2021 = SingleYear[2021][0:8]
S2022 = SingleYear[2022][0:8]
S2023 = SingleYear[2023][0:8]
Svec = ['Single-Family']*8
M2018 = MultiYear[2018][0:8]
M2019 = MultiYear[2019][0:8]
M2020 = MultiYear[2020][0:8]
M2021 = MultiYear[2021][0:8]
M2022 = MultiYear[2022][0:8]
M2023 = MultiYear[2023][0:8]
Mvec = ['Muti-Commercial']*8
I2018 = ImprovYear[2018][0:8]
I2019 = ImprovYear[2019][0:8]
I2020 = ImprovYear[2020][0:8]
I2021 = ImprovYear[2021][0:8]
I2022 = ImprovYear[2022][0:8]
I2023 = ImprovYear[2023][0:8]
Ivec = ['Residential Improvement']*8
ALLsector_vec = ['CENTRE','EAST','NORTH','NORTHEAST','NORTHWEST','SOUTH','SOUTHEAST','WEST']*3
ALLtype_vec = Svec + Mvec + Ivec
ALLvalue_2018 = pd.concat([S2018,M2018,I2018], axis = 0)
ALLvalue_2018list = ALLvalue_2018.tolist()
Table2018 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2018list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2019 = pd.concat([S2019,M2019,I2019], axis = 0)
ALLvalue_2019list = ALLvalue_2019.tolist()
Table2019 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2019list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2020 = pd.concat([S2020,M2020,I2020], axis = 0)
ALLvalue_2020list = ALLvalue_2020.tolist()
Table2020 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2020list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2021 = pd.concat([S2021,M2021,I2021], axis = 0)
ALLvalue_2021list = ALLvalue_2021.tolist()
Table2021 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2021list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2022 = pd.concat([S2022,M2022,I2022], axis = 0)
ALLvalue_2022list = ALLvalue_2022.tolist()
Table2022 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2022list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
ALLvalue_2023 = pd.concat([S2023,M2023,I2023], axis = 0)
ALLvalue_2023list = ALLvalue_2023.tolist()
Table2023 = pd.DataFrame(list(zip(ALLsector_vec,ALLtype_vec,ALLvalue_2023list)),
columns =['Sector','Residential Permit Type','Total Permit Value'])
import plotly.express as px
from plotly.subplots import make_subplots
# plot the long (tidy) dataframe
fig18 = px.bar(Table2018, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2018 Permit Value", barmode='stack')
fig18.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig18.show()
/Users/annesunrawee/opt/anaconda3/lib/python3.9/site-packages/scipy/__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.25.0 is required for this version of SciPy (detected version 1.26.1
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
fig19 = px.bar(Table2019, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2019 Permit Value", barmode='stack')
fig19.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig19.show()
fig20 = px.bar(Table2020, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2020 Permit Value", barmode='stack')
fig20.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig20.show()
fig21 = px.bar(Table2021, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2021 Permit Value", barmode='stack')
fig21.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig21.show()
fig22 = px.bar(Table2022, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2022 Permit Value", barmode='stack')
fig22.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig22.show()
fig23 = px.bar(Table2023, x="Sector", y="Total Permit Value", color="Residential Permit Type",text_auto='.4s', title="2023 Permit Value", barmode='stack')
fig23.update_layout(xaxis_title='Sector', yaxis=dict(tickformat="$",))
fig23.show()
ALLsectoryear_vec = ['CENTRE','EAST','NORTH','NORTHEAST','NORTHWEST','SOUTH','SOUTHEAST','WEST']*6
Allyear_vec = ['2018']*8 + ['2019']*8 + ['2020']*8 + ['2021']*8 + ['2022']*8 + ['2023']*8
ALLvalue_S = pd.concat([S2018,S2019,S2020,S2021,S2022,S2023], axis = 0)
ALLvalue_Slist = ALLvalue_S.tolist()
TableS = pd.DataFrame(list(zip(Allyear_vec,ALLsectoryear_vec,ALLvalue_Slist)),
columns =['Year','Sector','Total Permit Value'])
ALLvalue_M = pd.concat([M2018,M2019,M2020,M2021,M2022,M2023], axis = 0)
ALLvalue_Mlist = ALLvalue_M.tolist()
TableM = pd.DataFrame(list(zip(Allyear_vec,ALLsectoryear_vec,ALLvalue_Mlist)),
columns =['Year','Sector','Total Permit Value'])
ALLvalue_I = pd.concat([I2018,I2019,I2020,I2021,I2022,I2023], axis = 0)
ALLvalue_Ilist = ALLvalue_I.tolist()
TableI = pd.DataFrame(list(zip(Allyear_vec,ALLsectoryear_vec,ALLvalue_Ilist)),
columns =['Year','Sector','Total Permit Value'])
figS = px.line(TableS, x="Year", y="Total Permit Value", color="Sector",text ='Total Permit Value', title="Single-Family Permit Value")
figS.update_layout(xaxis_title='Year', yaxis=dict(tickformat="$",))
figS.show()
figM = px.line(TableM, x="Year", y="Total Permit Value", color="Sector",text ='Total Permit Value', title="Multi-Commercial Permit Value")
figM.update_layout(xaxis_title='Year', yaxis=dict(tickformat="$",))
figM.show()
figI = px.line(TableI, x="Year", y="Total Permit Value", color="Sector",text ='Total Permit Value', title="Residential Improvement Permit Value")
figI.update_layout(xaxis_title='Year', yaxis=dict(tickformat="$",))
figI.show()
I will summarize the findings to my guiding question: How do the residential building permit value vary in different communities within Calgary?
Findings from bar plots visualization
The EAST sector has the smallest permit value of all types because it is the smallest sector. However, the WEST sector, the second smallest sector, has a much higher permit value than the EAST sector.
The CENTER sector has the highest Commercial permit value, with several multimillion-dollar projects concentrated in the downtown Beltline area in 2018, 2019, and 2021.
The NORTHWEST sector shares the same trend as the CENTER sector, where the commercial residential building permit value dominates the sector. The concentration of large-scale commercial residential buildings surrounding the University of Calgary campus is a contributing factor to the high commercial residential permit value in the sector.
The NORTHWEST sector had a very high commercial residential building permit value in 2018. Part of it is due to two constructions of apartment/condo complex projects in the University District with over $123 M combined in value.
The NORTH and NORTHEAST sectors have led in single-family permit value since 2018, with many new communities, such as Ambleton, Lewisburg, Glacier Ridge, Cornerstone, and Homestead. SOUTH and SOUTHEAST sectors start to catch up with them in 2021, with new neighborhoods such as Seton, Rangeview, Hotchkiss, Yorkville, Belmont, Pine Creek, Wolf Willow, and Alpine Park.
Findings from line plots visualization
Residential Improvement permit value increased during the COVID-19 outbreak in 2020 for all sectors except the CENTRE sector, where the permit value slightly decreased due to the high concentration of large-scale commercial residential buildings. The SOUTH sector dominates in the residential improvement permit value throughout and peaks in 2021 and 2022.
During the COVID-19 outbreak, 2020 saw a drop in single-family permit value in 4 sectors (NORTH, SOUTHEAST, CENTRE, and WEST) while slightly increasing in the other four sectors, which indicates there is still demand for single-family homes due to the need for social distancing during the COVID-19 outbreak. Commercial residential building permit value dropped in 2020 in 6 sectors, with a sharp drop in the CENTRE and NORTH sectors, while the other two sectors (SOUTHEAST and WEST) increased.
Both single-family and commercial residential building permit values increased in 2021 across all sectors, except for a slight drop in the commercial residential building permit value in the SOUTH sector. However, the SOUTH has a very high single-family permit value in 2021.
Single-family permit value drops in 2023 in all sectors. Commercial residential building permit values decreased in most sectors except increasing in the two smallest sectors, the WEST and the EAST sectors.
1) The City of Calgary (2023). Building Permits.https://data.calgary.ca/Business-and-Economic-Activity/Building-Permits/c2es-76ed.
2) The City of Calgary (2023). Community Crime and Disorder Statistics (2012-2019).https://data.calgary.ca/Health-and-Safety/Community-Crime-and-Disorder-Statistics-2012-2019-/848s-4m4z.
3) The City of Calgary Market Trend reports: Residential/Multi-residential. 2023 Property Assessment Residential Market Trends. https://www.calgary.ca/property-owners/assessment/roll-highlights.html#:~:text=The%20typical%20market%20value%20assessment,compared%20to%20%24485%2C000%20in%202022.
4) 2022 Creb Community map. https://www.creb.com/-/media/Public/CREBcom/Housing_Statistics/CITY_OF_CALGARY_MAP_2022.pdf